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Abstract. The classical random matrix theory is mostly focused on asymptotic spectral 
properties of random matrices as their dimensions grow to infinity. At the same time 
many recent applications from convex geometry to functional analysis to information 
theory operate with random matrices in fixed dimensions. This survey addresses the 
non-asymptotic theory of extreme singular values of random matrices with independent 
entries. We focus on recently developed geometric methods for estimating the hard edge 
of random matrices (the smallest singular value). 
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1. Asymptotic and non-asymptotic problems on ran- 
dom matrices 

Since its inception, random matrix theory has been mostly preoccupied with 
asymptotic properties of random matrices as their dimensions grow to infinity. 
A foundational example of this nature is Wigner's semicircle law [96] . It applies to 
a family of n x n symmetric matrices A n whose entries on and above the diagonal 
are independent standard normal random variables. In the limit as the dimension 
n grows to infinity, the spectrum of the normalized matrices -^A n is distributed 

according to the semicircle law with density ^:V4 — x 2 supported on the interval 
[—2,2]. Precisely, if we denote by S n (z) the number of eigenvalues of that 
are smaller than z, then for every z £ M. one has 



In a similar way, Marchenko-Pastur law |55] governs the limiting spectrum of 

n x n Wishart matrices Wn,u — A* A, where A = Ajsr, n is an TV x n random Gaus- 
sian matrix whose entries are independent standard normal random variables. As 




almost surely as n —t oo. 
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the dimensions N,n grow to infinity while the aspect ratio n/N converges to a 
non-random number y £ (0,1], the spectrum of the normalized Wishart matri- 
ces jtWn : „ is distributed according to the Marchenko-Pastur law with density 

27rauVv^ — x )( x ~ a ) supported on [a, b] where a = (1 — ^/y) 2 , b = (1 + ^/y) 2 • The 
meaning of the convergence is similar to the one in Wigner's semicircle law. 

It is widely believed that phenomena typically observed in asymptotic random 
matrix theory are universal, that is independent of the particular distribution of 
the entries of random matrices. By analogy with classical probability, when we 
work with independent standard normal random variables Zi , we know that their 
normalized sum S n = X)"=i %i is again a standard normal random variable. 
This simple but useful fact becomes significantly more useful when we learn that 
it is asymptotically universal. Indeed, The Central Limit Theorem states that if 
instead of normal distribution Zi have general identical distribution with zero mean 
and unit variance, the normalized sum S n will still converge (in distribution) to the 
standard normal random variable as n — > oo. In random matrix theory, universality 
has been established for many results. In particular, Wigner's semicircle law and 
Marchenko-Pastur law are known to be universal - like the Central Limit Theorem, 
they hold for arbitrary distribution of entries with zero mean and unit variance 
(see [60l|6] for semi-circle law and [95] [5] for Marchenko-Pastur law). 

Asymptotic random matrix theory offers remarkably precise predictions as di- 
mension grows to infinity. At the same time, sharpness at infinity is often coun- 
terweighted by lack of understanding of what happens in finite dimensions. Let 
us briefly return to the analogy with the Central Limit Theorem. One often needs 
to estimate the sum of independent random variables S n with fixed number of 
terms n rather than in the limit n — ¥ oo. In this situation one may turn to Berry- 
Esseen's theorem which quantifies deviations of the distribution of S n from that of 
the standard normal random variable Z. In particular, if E|Zi| 3 = M < oo then 

\F(S n <z)-F(Z<z)\< r ^~, zeR, (1.1) 

where C is an absolute constant jTTJ [23] . Notwithstanding the optimality of Berry- 
Esseen inequality (|1.1[) . one can still hope for something better than the polynomial 
bound on the probability, especially in view of the super-exponential tail of the 
limiting normal distribution: P(\Z\ > z) < exp(— z 2 /2). Better estimates would 
indeed emerge in the form of exponential deviation inequalities |61[ 147] , but this 
would only happen when we drop explicit comparisons to the limiting distribution 
and study the tails of S n by themselves. In the simplest case, when Zi are i.i.d. 
mean zero random variables bounded in absolute value by 1, one has 

mS n \ > z) < 2exp(-cz 2 ), z>0, (1.2) 

where c is a positive absolute constant. Such exponential deviation inequalities, 
which are extremely useful in a number of applications, are non-asymptotic results 
whose asymptotic prototype is the Central Limit Theorem. 

A similar non-asymptotic viewpoint can be adopted in random matrix theory. 
One would then study spectral properties of random matrices of fixed dimensions. 
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Non-asymptotic results on random matrices are in demand in a number of today's 
applications that operate in high but fixed dimensions. This usually happens in 
statistics where one analyzes data sets with a large but fixed number of parameters, 
in geometric functional analysis where one works with random operators on finite- 
dimensional spaces (whose dimensions are large but fixed), in signal processing 
where the signal is randomly sampled in many but fixed number of points, and in 
various other areas of science and engineering. 

This survey is mainly focused on the non-asymptotic theory of the extreme sin- 
gular values of random matrices (equivalently, the extreme eigenvalues of sample 
covariance matrices) where significant progress was made recently. In Section [2] 
we review estimates on the largest singular value (the soft edge). The more diffi- 
cult problem of estimating the smallest singular value (the hard edge) is discussed 
in Section [3j and its connection with the Littlewood-Offord problem in additive 
combinatorics is the content of Section |4] In Section [5] we discuss several applica- 
tions of non-asymptotic results to the circular law in asymptotic random matrix 
theory, to restricted isometries in compressed sensing, and to Kashin's subspaces 
in geometric functional analysis. 

This paper is by no means a comprehensive survey of the area but rather a 
tutorial. Sketches of some arguments are included in order to give the reader 
a flavor of non-asymptotic methods. To do this more effectively, we state most 
theorems in simplified form (e.g. always over the field R); the reader will find 
full statements in the original papers. Also, we had to completely omit several 
important directions. These include random symmetric matrices which were the 
subject of the recent survey by Ledoux [48] and random matrices with independent 
columns, see in particular (TJ [94] . The reader is also encouraged to look at the 
comprehensive survey |19j on some geometric aspects of random matrix theory. 



2. Extreme singular values 

Geometric nature of extreme singular values The non-asymptotic view- 
point in random matrix theory is largely motivated by geometric problems in high 
dimensional Euclidean spaces. When we view an JV x n matrix A as a linear op- 
erator R™ — > R , we may want first of all to control its magnitude by placing 
useful upper and lower bounds on A. Such bounds arc conveniently provided by 
the smallest and largest singular values of A denoted s m i n (A) and s max (A); recall 
that the singular values are by definition the eigenvalues of \A\ = V 'A* A. 

The geometric meaning of the extreme singular values can be clear by consid- 
ering the best possible factors m and M in the two-sided inequality 

m\\x\\ 2 < \\Ax\\ 2 < M\\x\\ 2 for all x E R™. 

The largest m and the smallest M are precisely the extreme singular values s m - m (A) 
and s max (A) respectively. They control the distortion of the Euclidean geometry 
under the action of the linear transformation A; the distance between any two 
points in R ra can increase by at most the factor s meix (A) and decrease by at most 
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the factor s max (A). The extreme singular values are clearly related to the oper- 
ator norms of the linear operators A and A -1 acting between Euclidean spaces: 
s max (i) = m|| and if A is invertible then s min (A) = 1/||A _1 ||. 

Understanding the behavior of extreme singular values of random matrices is 
needed in many applications. In numerical linear algebra, the condition number 
k(A) — Smax(^4)/s m in(^4) often serves as a measure of stability of matrix algo- 
rithms. Geometric functional analysis employs probabilistic constructions of linear 
operators as random matrices, and the success of these constructions often depends 
on good bounds on the norms of these operators and their inverses. Applications of 
different nature arise in statistics from the analysis of sample covariance matrices 
A* A, where the rows of A are formed by N independent samples of some unknown 
distribution in R™. Some other applications are discussed in Section [5] 

Asymptotic behavior of extreme singular values We first turn to the 
asymptotic theory for the extreme singular values of random matrices with in- 
dependent entries (and with zero mean and unit variance for normalization pur- 
poses). From Marchenko-Pastur law we know that most singular values of such 
random N x n matrix A lie in the interval [\/~N — y/ri, + y/n\ ■ Under mild 
additional assumptions, it is actually true that all singular values lie there, so that 
asymptotically we have 

s min (A) ~ VN - V», s max (A) ~ VN + Vn. (2.1) 

This fact is universal and it holds for general distributions. This was established for 
Smax(A) by Geman [25] and Yin, Bai and Krishnaiah [97]. For s m i„(A), Silverstein 
|71j proved this for Gaussian random matrices, and Bai and Yin [8 a gave a unified 
treatment of both extreme singular values for general distributions: 

Theorem 2.1 (Convergence of extreme singular values, see [5]). Let A = An,u 
be an N x n random matrix whose entries are independent copies of some random 
variable with zero mean, unit variance, and finite fourth moment. Suppose that 
the dimensions N and n grow to infinity while the aspect ratio n/N converges to 
some number y 6 (0, 1]. Then 

—i= s m in{A) 1 - y/y, — != s max (A) -> 1 + y/y almost surely. 
v N v" 

Moreover, without the fourth moment assumption the seauence ^max(^) ^ ®l~ 
most surely unbounded ^j. 

The limiting distribution of the extreme singular values is known and universal. 
It is given by the Tracy- Widom law whose cumulative distribution function is 

Fi(x) = exp ( - ( [u(s) + (s~ x)u 2 (s)] ds) , (2.2) 

J X 

where u(s) is the solution to the Painleve II equation u" = 2u 3 + su with the 
asymptotic u(s) ~ 2^7^174 exp(— |a 3 ^ 2 ) as s -> 00. The occurrence of Tracy- 
Widom law in random matrix theory and several other areas was the subject of 
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an ICM 2002 talk of Tracy and Widom [5T] . This law was initially discovered for 
the largest eigenvalue of a Gaussian symmetric matrix [89] [90]. For the largest 
singular values of random matrices with independent entries it was established by 
Johansson [37] and Johnstone [39] in the Gaussian case, and by Soshnihikov [74] 
for more general distributions. For the smallest singular value, the corresponding 
result was recently obtained in a recent work Feldheim and Sodin [25] who gave a 
unified treatment of both extreme singular values. These results are known under a 
somewhat stronger subgaussian moment assumption on the entries ay of A, which 
requires their distribution to decay as fast as the normal random variable: 

Definition 2.2 (Subgaussian random variables). A random variable X is subgaus- 
sian if there exists K > called the subgaussian moment of X such that 

P(|X| >t)< 2e~ t2/K2 for t > 0. 

Examples of subgaussian random variables include normal random variables, 
±l-valued, and generally, all bounded random variables. The subgaussian assump- 
tion is equivalent to the moment growth condition (EIXIp) 1 ^ = O(^Jp) as p — >• oo. 

Theorem 2.3 (Limiting distribution of extreme singular values, see [IS]). Let 

A = Ajv,n be an N x n random matrix whose entries are independent and identi- 
cally distributed subgaussian random variables with zero mean and unit variance. 
Suppose that the dimensions N and n grow to infinity while the aspect ratio n/N 
stays uniformly bounded by some number y € (0, 1). Then the normalized extreme 
singular values 

7= 7= T 7= 7= T 

(Vn - Vn)(i/V» - i/VaO 1 / 3 (Vn + v^XVV^ + i/Vxy/z 

converge in distribution to the Tracy- Widom law (|2.2|1 . 

Non-asymptotic behavior of extreme singular values It is not entirely 
clear to what extent the limiting behavior of the extreme singular values such 
as asymptotics (|2.1I) manifests itself in fixed dimensions. Given the geometric 
meaning of the extreme singular values, our interest generally lies in establishing 
correct upper bounds on s m ax(^4) and lower bounds on s m i n (^4). We start with a 
folklore observation which yields the correct bound s ma x(^4) < VN + \fn up to 
an absolute constant factor. The proof is a basic instance of an e-net argument, a 
technique proved to be very useful in geometric functional analysis. 

Proposition 2.4 (Largest singular value of subgaussian matrices: rough bound). 
Let A be an N x n random matrix whose entries are independent mean zero sub- 
gaussian random variables whose subgaussian moments are bounded by 1. Then 

P(s max (A) > Ci^/N + Vn~) + t) < 2e~ ct \ t > 0. 

Here and elsewhere in this paper, C,C±,c,ci denote positive absolute constants. 
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Proof (sketch). We will sketch the proof for N — n; the general case is similar. The 
expression s max (A) = max x y^s"-- 1 (Ax, y) motivates us to first control the random 
variables (Ax, y) individually for each pair of vectors x, y on the unit Euclidean 
sphere S*™ -1 , and afterwards take the union bound over all such pairs. For fixed 
x, y £ S"^ 1 the expression (Ax, y) — J^i j a ij x jVi is a sum of independent random 
variables, where a,j denote the independent entries of A. If were standard nor- 
mal random variables, the rotation invariance of the Gaussian distribution would 
imply that (Ax, y) is again a standard normal random variable. This property 
generalizes to subgaussian random variables. Indeed, using moment generating 
functions one can show that a normalized sum of mean zero subgaussian random 
variables is again a subgaussian random variable, although the subgaussian mo- 
ment may increase by an absolute constant factor. Thus 

F({Ax,y) > s) < 2e~ cs2 , s > 0. 

Obviously, we cannot finish the argument by taking the union bound over 
infinite (even uncountable) number of pairs x,y on the sphere S*™ -1 . In order to 
reduce the number of such pairs, we discretize S*™" 1 by considering its e-net J\f e in 
the Euclidean norm, which is a subset of the sphere that approximates every point 
of the sphere up to error e. An approximation argument yields 

Smax(^4) = max (Ax, y) < (1 — e)~ 2 max (Ax, y) for ee (0,1). 

To gain a control over the size of the net M e , we construct it as a maximal e- 
separated subset of S 1 ™ -1 ; then the balls with centers in j\f £ and radii e/2 form 
a packing inside the centered ball of radius 1 + e/2. A volume comparison gives 
the useful bound on the cardinality of the net: \Af E \ < (1 + 2/e)™. Choosing for 
example e = 1/2, we are well prepared to take the union bound: 

P(smaxU) > 4s) < P( max (Ax, y) > s) < \AfJ max P((Ax,y) > s) < 5" • 2e" cs2 
We complete the proof by choosing s — Cy^n + t with appropriate constant C. □ 



By integration, one can easily deduce from Proposition 12.41 the correct expec- 
tation bound Es max (^4) < C\{yN + y/ri). This latter bound actually holds under 
much weaker moment assumptions. Similarly to Theorem 12.11 the weakest possi- 
ble fourth moment assumption suffices here. R. Latala [25] obtained the following 
general result for matrices with not identically distributed entries: 

Theorem 2.5 (Largest singular value: fourth moment, non-iid entries [2S])- Let A 
be a random matrix whose entries are independent mean zero random variables 
with finite fourth moment. Then 

E < C \ max ( Eay) 1/2 + max ( ^ Ea?-) 1/2 + ( ^ Ea«) ^ 



For random Gaussian matrices, a much sharper result than in Proposition 12.41 
is due to Gordon [3TJ GH 133]: 
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Theorem 2.6 (Exteme singular values of Gaussian matrices, see [IH])- Let A be 

an N x ?i matrix whose entries are independent standard normal random variables. 
Then 

VN-V^< Esmin(^) < Es max (A) < Vn + Vn. 

This result is a consequence of the sharp comparison inequalities for Gaussian 
processes due to Slepian and Gordon, see [23 [351 [33] an d Section 3.3]. 

Tracy- Widom fluctuations One can deduce from Theorem 12.61 a deviation 
inequality for the extreme singular values. It follows formally by using the concen- 
tration of measure in the Gauss space. Since the s m i n (A), s max (^4) are 1-Lipschitz 
functions of A considered as a vector in R Nn , we have 

F(VN-y/n-t < s min (A) < s max {A) < v r N+y/n~+t) > l-2e~ t2 / 2 , t > 0, (2.3) 

see [19] . For general random matrices with independent bounded entries, one can 
use Talagrand's concentration inequality for convex Lipschitz functions on the cube 
[76l [77]. Namely, suppose the entries of A are independent, have mean zero, and 
are uniformly bounded by 1. Since s max (A) is a convex function of A, Talagrand's 
concentration inequality implies 

P(|wW - Median(s max (A))| > t) < le^ I" 2 . 

Although the precise value of the median is unknown, integration of the previous 
inequality shows that |Es max (A) — Median(s max (A))| < C. The same deviation 
inequality holds for symmetric random matrices. 

Inequality (|2.3[) is optimal for large t because s max (A) is bounded below by 
the magnitude of every entry of A which has the Gaussian tail. But for small 
deviations, say for t < 1, inequality (|2.3|) is meaningless. Tracy- Widom law predicts 
a different tail behavior for small deviations t. It must follow the tail decay of the 
Tracy-Widom function F\ , which is not subgaussian [3] , [39] : 

cexp(-Cr 3 / 2 ) < 1 - F x (t) < Cexp(~C*'r 3 / 2 ) r > 0. 

The concentration of this type for Hermitian complex and real Gaussian matri- 
ces (Gaussian Unitary Ensemble and Gaussian Orthogonal Ensemble) was proved 
by Ledoux [48] and Aubrun [3J. Recently, Feldheim and Sodin [25] introduced a 
general approach, which allows to prove the asymptotic Tracy-Widom law and its 
non-asymptotic counterpart at the same time. Moreover, their method is applica- 
ble to random matrices with independent subgaussian entries both in symmetric 
and non-symmetric case. In particular, for anffxn random matrix A with inde- 
pendent subgaussian entries they proved that 

P (t) := F(s max (A) > VN + y/n + t\/n) < C cxp(-cnr 3/2 ) t > 0. (2.4) 

Bounds (|2.3|) and (|2.4[) show that the tail behavior of the maximal singular value is 
essentially different for small and large deviations: p(r) decays like exp(— enr 3 / 2 ) 
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for t < c(n/N) 2 and like exp(— c\Nt 2 ) for larger r. For square matrices the 
meaning of this phenomenon is especially clear. Large deviations of s max (A) 
are produced by bursts of single entries: both P(s max (A) > Es max (A) + t) and 
IP( I > Es max (^4) + t) are of the same order exp(— ct 2 ) for t > Es max (A). In 
contrast, for small deviations (for smaller t) the situation becomes truly multidi- 
mensional, and Tracy- Widom type asymptotics appears. 

The method of [25] also addresses the more difficult smallest singular value. 
For an TV x n random matrix A whose dimensions are not too close to each other 
Feldheim and Sodin [25] proved the Tracy-Widom law for the smallest singular 
value together with a non- asymptotic version of the bound s m i n (A) ~ y/~N — y/n: 

Pfsmi„(A) < VN - Vn~ - r^N ■ -^—) < %== exp(-cW 3 / 2 ). (2.5) 

V N-nJ 1-y/n/N 

3. The smallest singular value 

Qualitative invertibility problem In this section we focus on the behavior of 
the smallest singular value of random Nxn matrices with independent entries. The 
smallest singular value - the hard edge of the spectrum - is generally more difficult 
and less amenable to analysis by classical methods of random matrix theory than 
the largest singular value, the "soft edge" . The difficulty especially manifests itself 
for square matrices (N = n) or almost square matrices (N — n = o(n)). For 
example, we were guided so far by the asymptotic prediction s m i n (A) ~ ^/N — ^/n, 
which obviously becomes useless for square matrices. 

A remarkable example is provided by n x n random Bernoulli matrices A, 
whose entries are independent ±1 valued symmetric random variables. Even the 
qualitative invertibility problem, which asks to estimate the probability that A is 
invertible, is nontrivial in this situation. Komlos 44, 45 ; showed that A is invertible 
asymptotically almost surely: p n := P(s m i n (A) = 0) —> as n — > oo. Later Kahn, 
Komlos and Szemeredi [43 proved that the singularity probability satisfies p n < c n 
for some c € (0, 1). The base c was gradually improved in [78j [81], with the latest 
record of p n = (l/v / 2 + o(l)) n obtained in [T2] , It is conjectured that the dominant 
source of singularity of A is the presence of two rows or two columns that are equal 
up to a sign, which would imply the best possible bound p n = (1/2 + o(l))™. 

Quantitative invertibility problem The previous problem is only concerned 
with whether the hard edge s m i n (A) is zero or not. This says nothing about the 
quantitative invertibility problem of the typical size of s m - m (A). The latter question 
has a long history. Von Neumann and his associates used random matrices as 
test inputs in algorithms for numerical solution of systems of linear equations. 
The accuracy of the matrix algorithms, and sometimes their running time as well, 
depends on the condition number n(A) — s max (A) / s m i n (A) . Based on heuristic 
and experimental evidence, von Neumann and Goldstine predicted that 

Smin(A) ~ n" 1 / 2 , Smax(^4) ~ n 1 ^ 2 with high probability (3.1) 
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which together yield k(A) ~ n, see [HH Section 7.8]. In Section [5] we saw several 
results establishing the second part of (|3.1| . for the largest singular value. 

Estimating the smallest singular value turned out to be more difficult. A more 
precise form of the prediction s min (A) ~ rT 1 !" 1 was repeated by Smale [73] and 
proved by Edelman |20] and Szarek |79l for random Gaussian matrices A, those 
with i.i.d. standard normal entries. For such matrices, the explicit formula for the 
joint density of the eigenvalues Ai of ^A*A is available: 

n n 

pdf(A 1 ,...,A n )=C„ J] |Ai- Xj\ nA.r 1/2 exp(-^A,/2). 

l<i<j<n i—1 i—1 

Integrating out all the eigenvalues except the smallest one, one can in principle 
compute its distribution. This approach leads to the following asymptotic result: 

Theorem 3.1 (Smallest singular value of Gaussian matrices [3U] )• Let A = A n be 

an n x n random matrix whose entries are independent standard normal random 
variables. Then for every fixed e > one has 

¥(s m i n (A) < en~ 1/2 ) 1 - exp(-e - e 2 /2) as n -t oo. 

The limiting probability behaves as 1 — exp(— e — £ 2 /2) ~ e for small e. In fact, 
the following non- asymptotic bound holds for all n: 

P(s mi „(A) < en- 1 ' 2 ) <e, e > 0. (3.2) 

This follows from the analysis of Edelman |20j : Sankar, Spielman and Teng [55] 
provided a different geometric proof of estimate (|3.2[) up to an absolute constant 
factor and extended it to non-centered Gaussian distributions. 

Smallest singular values of general random matrices These methods do 
not work for general random matrices, especially those with discrete distributions, 
where rotation invariance and the joint density of eigenvalues are not available. 
The prediction that s m i n (^4) ~ n -1 / 2 has been open even for random Bernoulli 
matrices. Spielman and Teng conjectured in their ICM 2002 talk [75] that estimate 
(|3.2I) should hold for the random Bernoulli matrices up to an exponentially small 
term that accounts for their singularity probability: 

P(s mi n(A) Ken- 1 / 2 ) <£ + c'\ £>0 

where c € (0, 1) is an absolute constant. The first polynomial bound on s m i n (^4) for 
general random matrices was obtained in |63j . Later Spielman- Teng's conjecture 
was proved in |65] up to a constant factor, and for general random matrices: 

Theorem 3.2 (Smallest singular value of square random matrices [65 ). Let A be 
an n x n random matrix whose entries are independent and identically distributed 
subgaussian random variables with zero mean and unit variance. Then 

P(s m m(A) < en' 1 ' 2 ) <Ce + c n , e > 

where C > and c £ (0,1) depend only on the subgaussian moment of the entries. 
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This result addresses both qualitative and quantitative aspects of the invert- 
ibility problem. Setting e = we see that A is invertible with probability at 
least 1 — c™. This generaizes the result of Kahn, Komlos and Szemeredi [?3] from 
Bernoulli to all subgaussian matrices. On the other hand, quantitatively, The- 
orem [321 states that s m in(^4) ^ rT 1 / 2 with high probability for general random 
matrices. A corresponding non- asymptotic upper bound s m [ n (A) J$ n~ x l 2 also 
holds 66 , so we have s m i n (A) ~ rT 1 ! 2 as in von Neumann-Goldstine's prediction. 
Both these bounds, upper and lower, hold with high probability under the weaker 
fourth moment assumption on the entries [63 [66] ■ 

This theory was extended to rectangular random matrices of arbitrary dimen- 
sions N x n in [67] . As we know from Section [2] one expects that s m i n (A) ~ 
y/N — i/n. But this would be incorrect for square matrices. To reconcile rectan- 
gular and square matrices we make the following correction of our prediction: 

Smin (A) ~ y/~N — \Jn — 1 with high probability. (3-3) 

For square matrices one would have the correct estimate s m i n (A) ~ \fn— s/n — 1 ~ 
n -i/2 f n owm g re sult extends Theorem 13.21 to rectangular matrices: 

Theorem 3.3 (Smallest singular value of rectangular random matrices [65]). Let A 
be annxn random matrix whose entries are independent and identically distributed 
subgaussian random variables with zero mean and unit variance. Then 

P(s mi „(A) < e(VA - Vn~l)) < (Ce) N - n+1 +c N , e > 

where C > and c € (0,1) depend only on the subgaussian moment of the entries. 

This result has been known for a long time for tall matrices, whose the aspect 
ratio A = n/N is bounded by a sufficiently small constant, see [10 . The optimal 
bound s m in (A) > csfN can be proved in this case using an e-net argument similar 
to Proposition 12.41 This was extended in [53] to s m i n (A) > c\\/~N for all aspect 
ratios A < 1 — c/ log n. The dependence of c\ on the aspect ratio A was improved in 
[2] for Bernoulli matrices and in [62 for general subgaussian matrices. Feldheim- 
Sodin's Theorem 12.31 gives precise Tracy- Widom fluctuations of s m i n (A) for tall 
matrices, but becomes useless for almost square matrices (say for N < n + n 1 / 3 ). 
Theorem [33] is an an optimal result (up to absolute constants) which covers matri- 
ces with all aspect ratios from tall to square. Non-asymptotic estimate (|3. 31) was 
extended to matrices whose entries have finite (4 + e)-th moment in [93) . 

Universality of the smallest singular values The limiting distribution of 
Smin (A) turns out to be universal as dimension n oo. We already saw a similar 
universality phenomenon in Theorem 12.31 for genuinely rectangular matrices. For 
square matrices, the corresponding result was proved by Tao and Vu [87] : 

Theorem 3.4 (Smallest singular value of square matrices: universality [87]). Let A 
be annxn random matrix whose entries are independent and identically distributed 
random variables with zero mean, unit variance, and finite K-th moment where K 
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is a sufficiently large absolute constant. Let G be an n x n random matrix whose 
entries are independent standard normal random variables. Then 

P(V^Wn(G) < t-n- c )-n c < P(V^s min (A) <t)< P(V^s min (G) < t + n- c )+n c 

where c > depends only on the K-th moment of the entries. 

On a methodological level, this result may be compared in classical probability 
theory to Berry-Esseen theorem (ll.l[) which establishes polynomial deviations from 
the limiting distribution, while Theorems 13.21 and 13.31 bear a similarity with large 
deviation results like (jl.2p which give exponentially small tail probabilities. 

Sparsity and invertibility: a geometric proof of Theorem l3.2l We will now 
sketch the proof of Theorem 13.21 given in [65 . This argument is mostly based on 
geometric ideas, and it may be useful beyond spectral analysis of random matrices. 

Looking at s m i n (-A) = min^ggn-i 1 1 j4ot 1 1 2 we see that our goal is to bound below 
||vla;||2 uniformly for all unit vectors x. We will do this separately for sparse 
vectors and for spread vectors with two very different arguments. Choosing a small 
absolute constant Co > 0, we first consider the class of sparse vectors 

Sparse :— {x € S*" -1 : | supp(a;)| < Cqu} 

Establishing invertibility of A on this class is relatively easy. Indeed, when we look 
at ||Ae||2 for sparse vectors x of fixed support supp(ir) = / of size |/| = cqu, we 
are effectively dealing with the n x con submatrix Ai that consists of the columns 
of A indexed by /. The matrix Aj is tall, so as we said below Theorem 13.31 its 
smallest singular value can be estimated using the standard £-net argument. This 
gives s m i n (A[) > cn 1 / 2 with probability at least 1 — 2e _ ™. This allows us to further 
take the union bound over ( c " n ) < e™/ 2 choices of support /, and conclude that 
with probability at least 1 — 2e - "/ 2 we have invertibility on all sparse vectors: 

min 1 1 -Axil 2 = min s min (Aj) > cn 1 ^ 2 . (3-4) 

xeSparse |/|<eon 

We thus obtained a much stronger bound than we need, n 1 / 2 instead of n~ x l 2 . 

Establishing invertibility of A on non-sparse vectors is more difficult because 
there are too many of them. For example, there are exponentially many vectors 
on S 71 ^ 1 whose coordinates all equal in" 1 / 2 and which have at least a constant 
distance from each other. This gives us no hope to control such vectors using 
e-nets, as any nontrivial net must have cardinality at least 2" . So let us now focus 
on this most difficult class of extremely non-sparse vectors 

Spread := {x € S"" 1 : \x t \ > cin~ 1/2 for all i}. 

Once we prove invertibility of A on these spread vectors, the argument can be 
completed for all vectors in S*™ -1 by an approximation argument. Loosely speaking, 
if x is close to Sparse we can treat x as sparse, otherwise x must have at least cn 
coordinates of magnitude \xi\ = 0(ri -1 / 2 ), which allows us to treat x as spread. 
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An obvious advantage of spread vectors is that we know the magnitude of all 
their coefficients. This motivates the following geometric invertibility argument. 
If A performs extremely poor so that s m i n (^4) = 0, then one of the columns X k of 
A lies in the span H k = span(X;)^fc of the others. This simple observation can 
be transformed into a quantitative argument. Suppose x = {x\, . . . , x n ) G R" is a 
spread vector. Then, for every k = 1, . . . , n, we have 

n 

\\Ax\\ 2 > dist(Ax,H k ) = dist (y^XiX^H^j = dist(x k X k , H k ) 

i=l 

= \x k \ ■ dist(X k ,H k ) > Cl n- 1/2 dist(X k ,H k ). (3.5) 
Since the right hand side does not depend on x, we have proved that 

min \\Ax\\ 2 > c in ~ 1/2 dist(X„, (3.6) 

X'C Spread 

This reduces our task to the geometric problem of independent interest es- 
timate the distance between a random vector and an independent random hyper- 
plane. The expectation estimate 1 < E dist(X„, H n ) 2 = O(l) follows easily by 
independence and moment assumptions. But we need a lower bound with high 
probability, which is far from trivial. This will make a separate story connected 
to the Littlewood-Offord theory of small ball probabilities, which we discuss in 
Section [4] In particular we will prove in Corollary 14.41 the optimal estimate 

P(dist(X n , H n ) < e) < Ce + c", e > 0, (3.7) 

which is simple for the Gaussian distribution (by rotation invariance) and difficult 
to prove e.g. for the Bernoulli distribution. Together with (|3 .6[) this means that 
we proved invertibility on all spread vectors: 

P( min || Ax\\ 2 < en" 1 / 2 ) <Ce + c n , e> 0. 

x£ Spread 

This is exactly the type of probability bound claimed in Theorem 13.21 As we said, 
we can finish the proof by combining with the (much better) invertibility on sparse 
vectors in Q3.4j) . and by an approximation argument. 



4. Littlewood-Offord theory 

Small ball probabilities and additive structure We encountered the fol- 
lowing geometric problem in the previous section: estimate the distance between a 
random vector X with independent coordinates and an independent random hyper- 
plane H in R n . We need a lower bound on this distance with high probability. Let 
us condition on the hyperplane H and let a g R™ denote its unit normal vector. 
Writing in coordinates a — (a%, . . . , a n ) and X = • • • , £n), we see that 

n 

6dst{X,H) = {a,X) = |£oi6|. (4.1) 

i=l 
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We need to understand the distribution of sums of independent random variables 

n 

5 = J~] aj&, ||a|| 2 = 1, 

i=l 

where a = (oi, . . . , a„) S K n is a given coefficient vector, and £1, • • • are inde- 
pendent identically distributed random variables with zero mean and unit variance. 

Sums of independent random variables is a classical theme in probability theory. 
The well-developed area of large deviation inequalities like (|1.2|) demonstrates that 
5 nicely concentrates around its mean. But our problem is opposite as we need to 
show that 5 is not too concentrated around its mean 0, and perhaps more generally 
around any real number. Several results in probability theory starting from the 
works of Levy [50], Kolmogorov [42] and Esseen [24] were concerned with the 
spread of sums of independent random variables, which is quantified as follows: 

Definition 4.1. The Levy concentration function of a random variable 5 is 
£{S,e) = supP(|5-v| <e), e > 0. 

Levy concentration function measures the small ball probability [51] , the likeli- 
hood that 5 enters a small interval. For continuous distributions one can show that 
£(5, e) < e for all e > 0. For discrete distributions this may be false. Instead, a 
new phenomenon arises for discrete distributions which is unseen in large deviation 
theory: Levy concentration function depends on the additive structure of the coef- 
ficient vector a. This is best illustrated on the example where & are independent 
Bernoulli random variables (±1 valued and symmetric). For sparse vectors like 
a = 2 -1 / 2 (l, 1,0,..., 0), Levy concentration function can be large: £(5, 0) = 1/2. 
For spread vectors, Berry-Esseen's theorem (jl.ip yields a better bound: 

For a' = n _1 / 2 (l,l,...,l), £(5, e) < C(e + n~ 1/2 ). (4.2) 

The threshold n" 1 / 2 comes from many cancelations in the sums ^2 ±1 which occur 
because all coefficients a% are equal. For less structured a, fewer cancelations occur: 

For a" = n" 1 / 2 (l + 1,1 + -,..., 1 + -), C(S, 0) ~ n^' 2 . (4.3) 
n n n 

Studying the influence of additive structure of the coefficient vector a on the spread 
of S = ^2 a,i£i became known as the Littlewood-0 fjord problem. It was initially 
developed by Littlewood and Offord [52] , Erdos and Moser [5T] H2] > Sarkozy and 
Szemeredi [BS], Halasz [3D], Frankl and Fiiredi [5B]. For example, if all |oi| > 1 
then £(5,1) < Cn" 1/2 [SHU], which agrees with (|4.2j) . Similarly, a general fact 
behind (H3J is that if \ai-a,j\ > 1 for all i ^ j then £(5, 1) < Cn" 3 / 2 [22l EH [40] . 

New results on Levy concentration function Problems of invertibility of 
random matrices motivated a recent revisiting of the Littlewood-Offord problem 
by Tao and Vu [83l EU [86l [88], the authors [65J E2], Friedland and Sodin [27]. 
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Additive structure of the coefficient vector a is related to the shortest arithmetic 
progression into which it embeds. This length is conveniently expressed as the 
least common denominator lcd(a) defined as the smallest 9 > such that 9a e 
Z" \ 0. Examples suggest that Levy concentration function should be inversely 
proportional to the least common denominator: lcd(a') = n 1 / 2 ~ l/£(5, 0) in 
l|Pf and lcd(a") = n 3/2 ~ 1/£(S,Q) in g^J. This is not a coincidence. But to 
state a general result, we will need to consider a more stable version of the least 
common denominator. Given an accuracy level a > 0, we define the essential least 
common denominator 

lcd Q (a) := inf {6 > : dist(6»a,Z n ) < min(-^||0a|| 2 , a)}. 

The requirement dist(#a, Z") < i||#a||2 ensures approximation of 9a by non-trivial 
integer points, those in a non-trivial cone in the direction of a. The constant 
is arbitrary and it can be replaced by any other constant in (0, 1). One typically 
uses this concept for accuracy levels a = c^/n with a small constant c such as 
c = jq- The inequality dist(0a, Z") < a yields that most of the coordinates of 9a 
are within a small constant distance from integers. For such a, in examples (|4.2[) 
and (|4.3| one has as before lcd a (a') ~ n 1 / 2 and lcd Q (a") ~ n 3 / 2 . Here we state 
and sketch a proof of a general Littlewood-Offord type result from 67J. 

Theorem 4.2 (Levy concentration function via additive structure). Let £i, . . . , 
be independent identically distributed mean zero random variables, which are well 
spread: p := 1) < 1. Then, for every coefficient vector a = (a%, . . . ,a n ) £ 

S*" -1 and every accuracy level a > 0, the sum S = Y^i=i a i£i sa ti s fi es 

C(S,e) <Ce + C/\cd a (a) + Ce- ca \ e > 0, (4.4) 
where C, c > depend only on the spread p. 

Proof. A classical Esseen's concentration inequality |24) bounds the Levy concen- 
tration function of an arbitrary random variable Z by the L\ norm of its charac- 
teristic function <fiz(9) = Kexp(i9Z) as follows: 

£(Z,1)<C J ( 4 - 5 ) 

One can prove this inequality using Fourier inversion formula, see |80l Section 7.3]. 

We will show how to prove Theorem 14. 21 for Bernoulli random variables the 
general case requires an additional argument. Without loss of generality we can 
assume that lcd Q (a) > ^L. Applying (|4.5p for Z — S/e, we obtain by independence 
that 

/l pi " 

\<t>s(9/e)\d9 = C ]J\h(9/e)\d9, 
-i J-i j=1 

where 4>j(t) — E,exp(iaj^jt) = cos(ajt). The inequality \x\ < exp(— ^(1 — x 2 )) 
which is valid for all x £ R implies that 

\^{t)\ <exp(-isin 2 (a J t)) < exp ( - 1 dist(^, Z) 2 ) . 
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Therefore 

exp(--^dist(^-,Z) 2 )^ = C / cx P (--/ 2 (0)) dtf (4.6) 

where f(9) = dist (^a,Z n ). Since lcdo,(a) > — , the definition of the essential 
least common denominator implies that for every 9 G [— 1,1] we have f{9) > 
min(j(^7 ||a|| 2 , a). Since by assumption 1 1 cz- 1 1 2 = 1; it follows that 

exp(-I/ 2 (,))<exp(-i( I A-) 2 ) + exp(-«V2). 

Substituting this into (|4.6|) yields C(S, e) < Ci(e + 2 cxp(— a 2 /2)) as required. □ 

Theorem 14.21 justifies our empirical observation that Levy concentration func- 
tion is proportional to the amount of structure in the coefficient vector, which is 
measured by the (reciprocal of) its essential least common denominator. As we 
said, this result is typically used for accuracy level a — c^/n with some small 
positive constant c. In this case, the term Ce~ ca in (|4.4[) is exponentially small 
in n (thus negligible in applications), and the term Ce is optimal for continuous 
distributions. 

Theorem 14. 21 performs best for totally unstructured coefficient vectors a, those 
with exponentially large lcd a (a). Heuristically, this should be the case for random 
vectors, as randomness should destroy any structure. While this is not true for 
general vectors with independent coordinates (e.g. for equal coordinates with 
random signs) , it is true for normals of random hyperplanes: 

Theorem 4.3 (Random vectors are unstructured 65j ) . Let Xi be random vectors 
in K™ whose coordinates are independent and identically distributed subgaussian 
random variables with zero mean and unit variance. Let a € R™ denote a unit 
normal vector of H = span(Xi, . . . , X n -i). Then, with probability at least \ — e~ cn , 

lcd a (a) > e cn for a = c^/n, 

where c > depends only on the subgaussian moment. 

Therefore for random normals a, Theorem 14.21 yealds with high probability a 
very strong bound on Levy concentration function: 

C(S,e) < Ce + c n , e > 0. (4.7) 

This brings us back to the distance problem considered in the beginning of this 
section, which motivated our study of Levy concentration function: 

Corollary 4.4 (Distance between random vectors and hyperplanes [BS]). Let Xi 
be random vectors as in Theorem \4-3\ and H n = span(Xi, . . . , X n _i). Then 

P(dist(X„, H n ) < e) < Ce + c n , e > 0, 

where C, c > depend only on the subgaussian moment. 
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Proof. As was noticed in (|4.1[) , we can write dist(Jf n , H n ) as a sum of independent 
random variables, and then bound it using (|4.7[) . □ 

Corollary H3] offers us exactly the missing piece (|3.7|) in our proof of the invert- 
ibility Theorem [321 This completes our analysis of invertibility of square matrices. 

Remark. These methods generalize to rectangular matrices [671 [93] . For example, 
Corollary 14.41 can be extended to compute the distance between random vectors 
and subspaces of arbitrary dimension |67j : for H n = span(Ai, . . . , X n ^j) we have 
(Edist^,^) 2 ) 1 / 2 = s/d and 

P( dist(X„, H n ) < eVd) < {Ce) d + c", s > 0. 



5. Applications 

The applications of non-asymptotic theory of random matrices are numerous, and 
we cannot cover all of them in this note. Instead we concentrate on three different 
results pertaining to the classical random matrix theory (Circular Law), signal 
processing (compressed sensing) , and geometric functional analysis and theoretical 
computer science (short Khinchin's inequality and Kashin's subspaces). 

Circular law Asymptotic theory of random matrices provides an important 
source of heuristics for non-asymptotic results. We have seen an illustration of 
this in the analysis of the extreme singular values. This interaction between the 
asymptotic and non-asymptotic theories goes the other way as well, as good non- 
asymptotic bounds are sometimes crucial in proving the limit laws. One remarkable 
example of this is the circular law which we will discuss now. 

Consider a family of n x n matrices A whose entries are independent copies of 
a random variable X with mean zero and unit variance. Let /i n be the empirical 
measure of the eigenvalues of the matrix B n = -^=A n , i.e. the Borel probability 
measure on C such that fi n (E) is the fraction of the eigenvalues of B n contained 
in E. A long-standing conjecture in random matrix theory, which is called the 
circular law, suggested that the measures \x n converge to the normalized Lebesgue 
measure on the unit disc. The convergence here can be understood in the same 
sense as in the Wigner's semicircle law. The circular law was originally proved by 
Mehta [5 6) for random matrices with standard normal entries. The argument used 
the explicit formula for joint density of the eigenvalues, so it could not be extended 
to other classes of random matrices. While the formulation of Wigner's semicircle 
law and the circular law look similar, the methods used to prove the former are 
not applicable to the latter. The reason is that the spectrum of a general matrix, 
unlike that of a Hcrmitian matrix, is unstable: a small change of the entries may 
cause a significant change of the spectrum (see [6]). Girko [30] introduced a new 
approach to the circular law based on considering the real part of the Stieltjes 
transform of measures \i n . For z = x + iy the real Stieltjes transform is defined by 
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the formula 

S n Az) = Rc(iTr(B„ - zl n )-^ = -AQ log | d et(B n - 
Since |det(-B„ — zl)\ 2 = det(B n — zI)(B n — zl)* , this is the same as 

where s^O) > ... > s£°(z) > are the eigenvalues of the Hermitian matrix 
(B n — zI)(B n — zl)* , or in other words, the squares of the singular values of the 
matrix V n = B n — zl. Girko's argument reduces the proof of the circular law to the 
convergence of real Stieltjes transforms, and thus to the behavior of the sum above. 
The logarithmic function is unbounded at and 00. To control the behavior near 
00, one has to use the bound for the largest singular value of V n , which is relatively 
easy. The analysis of the behavior near requires bounds on the smallest singular 
value of V n , and is therefore more difficult. 

Girko's approach was implemented by Bai [1], who proved the circular law for 
random matrices whose entries have bounded sixth moment and bounded density. 
The bounded density condition was sufficient to take care of the smallest singular 
value problem. This result was the first manifestation of the universality of the 
circular law. Still, it did not cover some important classes of random matrices, in 
particular random Bernoulli matrices. The recent results on the smallest singular 
value led to a significant progress on establishing the universality of the circular 
law. A crucial step was done by Gotze and Tikhomirov |34) who extended the 
circular law to all subgaussian matrices using [53]. In fact, the results of [33] 
actually extended it to all random entries with bounded fourth moment. This was 
further extended to random variables having bounded moment 2 + e in [351 182) . 
Finally, in |85] Tao and Vu proved the Circular Law in full generality, with no 
assumptions besides the unit variance. Their approach was based on the smallest 
singular value bound from |82j and a novel replacement principle which allowed 
them to treat the other singular values. 

Compressed Sensing Non-asymptotic random matrix theory provides a right 
context for the analysis of random measurements in the newly developed area of 
compressed sensing, see the ICM 2006 talk of Candes [Hj- Compressed sensing is 
an area of information theory and signal processing which studies efficient tech- 
niques to reconstruct a signal from a small number of measurements by utilizing 
the prior knowledge that the signal is sparse [18] . 

Mathematically, one seeks to reconstruct an unknown signal x € R" from some 
m linear measurements viewed as a vector Ax € K m , where A is some known mxn 
matrix called the measurement matrix. In the interesting case m < n, the problem 
is undcrdetermined and we are interested in the sparsest solution: 



minimize ||x*||o subject to Ax* = Ax, 



(5.1) 
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where ||.t||o = |supp(a;)|. This optimization problem is highly non-convex and 
computationally intractable. So one considers the following convex relaxation of 
(|5.1I) . which can be efficiently solved by convex programming methods: 



where ||x||i = Y17=i \ Xi \ denotes the l\ norm. 

One would then need to find conditions when problems (|5.1|) and (|5.2[) are 
equivalent. Candes and Tao |16j showed that this occurs when the measurement 
matrix A is a restricted isometry. For an integer s < n, the restricted isometry 
constant S S (A) is the smallest number 6 > which satisfies 

(1 - S)\\x\\l < \\Ax\\l < (1 + 5)\\x\\l for all x G E™, | supp(x)| < s. (5.3) 

Geometrically, the restricted isometry property guarantees that the geometry of 
s-sparse vectors x is well preserved by the measurement matrix A. In turns out 
that in this situation one can reconstruct x from Ax by the convex program (|5.2I) : 

Theorem 5.1 (Sparse reconstruction using convex programming .161). Assume 
$2s < c. Then the solution of (15.21) equals x whenever | supp(a;)| < s. 

A proof with c = \/2 — 1 is given in [TS]; the current record is c = 0.472 [T3"] . 
Restricted isometry property can be interpreted in terms of the extreme singular 
values of submatrices of A. Indeed, (|5.3j) equivalently states that the inequality 



holds for all m x s submatrices A], those formed by the columns of A indexed by 
sets / of size s. In light of Sections [2] and El it is not surprising that the best known 
restricted isometry matrices are random matrices. It is actually an open problem 
to construct deterministic restricted isometry matrices as in Theorem 15.21 below . 

The following three types of random matrices are extensively used as measure- 
ment matrices in compressed sensing: Gaussian, Bernoulli, and Fourier. Here we 
summarize their restricted isometry properties, which have the common remark- 
able feature: the required number of measurements m is roughly proportional to 
the sparsity level s rather than the (possibly much larger) dimension n. 

Theorem 5.2 (Random matrices are restricted isometries). Let m, n, s be positive 
integers, e, 5 € (0, 1), and let A be an m x n measurement matrix. 

1. Suppose the entries of A are independent and identically distributed sub- 
gaussian random variables with zero mean and unit variance. Assume that 



where C depends only on e, 5, and the subgaussian moment. Then with probability 
at least 1 — e, the matrix A — -j^A is a restricted isometry with 8 S (A) < 6. 

2. Let A be a random Fourier matrix obtained from the n x n discrete Fourier 
transform matrix by choosing m rows independently and uniformly. Assume that 



minimize ||x*||i subject to Ax* = Ax, 



(5.2) 



VI - 5 < s min (A 7 ) < s max (A/) < vT+7 



m > Cslog(2n/s) 



m > Cslog 4 (2n). 



(5.4) 
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where C depends only on e and 8. Then with probability at least 1 — e, the matrix 
A = -j^A is a restricted isometry with 8 S (A) < S. 

For random subgaussian matrices this result was proved in [9j 157] by an e-net 
argument, where one first checks the deviation inequality — 1 1 < <5 with 

exponentially high probability for a fixed vector x as in (|5.3[) . and afterwards lets 
x run over some fine net. For random Fourier matrices the problem is harder. It 
was first addressed in [T7] with a little higher exponent than in 1)5 .4(1 ; the exponent 
4 was obtained in [64], and it is conjectured that the optimal exponent is 1. 



Short Khinchin's inequality and Kashin's subspaces Let 1 < p < oo. The 

classical Khinchin's inequality states that there exist constants A p , B p such that 
for all x = (xi , . . . , x n ) £ R n 

Ap\\x\\ 2 <[ Ave \2_^e j x J ) < B p \\x\\ 2 . 

\ee{-l,l}" I ^ ' 

The average here is taken over all 2 n possible choices of signs e (it is the same as 
the expectation with respect to independent Bernoulli random variables Ej). Since 
the mid-seventies, the question was around whether Khinchin's inequality holds 
for averages over some small sets of signs e. A trivial lower bound follows by a 
dimension argument: such a set must contain at least n points. Here we shall 
discuss only the case p = 1, which is of considerable interest for computer science. 
This problem can be stated more precisely as follows: as follows: 

Given 6 > 0, find a(6),/3(6) > and construct a set V C {— 1, 1}™ of 
cardinality less than (1 + S)n such that for all x = (x%, . . . , x n ) £ M. n 



a(6)\\x\\ 2 <Ave\j2 £ 3 x 3 < i3(5)\\x\\ 2 . (5.5) 



The first result in this direction belongs to Schechtman [TO] who found an affir- 
mative solution to this problem for 6 greater than some absolute constant. He 
considered a set V consisting of N = [(1 + S)n\ independent random ±1 vectors, 
which can be written as an N x n random Bernoulli matrix A. In the matrix 
language, the inequality above reads ck(<5) 1 1 a; 1 1 2 < -/V 1 1 j 1 1 1 < /3(<5)||x||2 for all 
x e W l . This means that one can take 

a^^N- 1 inf \\Ax\\x, (3(6) = N^ 1 sup \\Ax\\x. 

xes™- 1 xes™- 1 

These expressions bear a similarity to the smallest and the largest singular values 
of the matrix A. In fact, up to the coefficient TV" 1 , (3(5) is the norm of A considered 
as a linear operator from £2 to and a (S) is the reciprocal of the norm of its 
inverse. Schechtman's theorem can now be derived using the e-net argument. 

The case of small S is more delicate. For a random A, the bound for f3(5) < C 
can be obtained by the e-net argument as before. However, an attempt to apply 
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this argument for a (S) runs into to the same problems as for the smallest singular 
value. For any fixed S > the solution was first obtained first by Johnson and 
Schechtman [35] who showed that there exists V satisfying (|5.5I) with a(5) = c 1 / 5 . 
In [54] this was established for a random set V (or a random matrix A) with the 
same bound on a(S). Furthermore, the result remains valid even when S depends 
on n, as long as S > e/logn. The proof uses the smallest singular value bound 
from |53] in a crucial way. The bound on a(5) has been further improved in [2], 
also using the singular value approach. Finally, a theorem in [62] asserts that for 
a random set V the inequalities ()5.5[) hold with high probability for 



Moreover, the result holds for all 6 > and n, without any restrictions. The proof 
combines the methods of [63] and a geometric argument based on the structure of 
a section of the £" ball. The probability estimate of [62] can be further improved 
if one replaces the small ball probability bound of [63] with that of [65] . 

The short Khinchin inequality shows also that the l\ and £2 norms are equiv- 
alent on a random subspace E := AW 1 C R N . Indeed, if A is an N x n ran- 
dom matrix, then with high probability every vector x £ W 1 satisfies a(<5)||x||2 < 
./V _1 ||Ax||i < iV _1 / 2 || Ax|| 2 < C||x|| 2 . The second inequality here is Cauchy- 
Schwartz, and the third one is the largest singular value bound. Thierefore 



Subspaces E possessing property (|5.6|) are called Kashin's subspaces. The classical 
Dvoretzky theorem states that a high-dimensional Banach space has a subspace 
which is close to Euclidean [55]. The dimension of such subspace depends on the 
geometry of the ambient space. Milman proved that such subspaces always exist 
in dimension clogn, where n is the dimension of the ambient space [58] (see also 
[59]). For the space l\ the situation is much better, and such subspaces exist in 
dimension (1 — S)n for any constant S > 0. This was first proved by Kashin |41) also 
using a random matrix argument. Obviously, as 5 — > 0, the distance between the l\ 
and £2 norms on such subspace grows to 00. The optimal bound for this distance 
has been found by Garnaev and Gluskin [28] who used subspaces generated by 
Gaussian random matrices. 

Kashin's subspaces turned out to be useful in theoretical computer science, 
in particular in the nearest neighbor search [36] and in compressed sensing. At 
present no deterministic construction is known of such subspaces of dimension n 
proportional to N. The result of [62 shows that a [(1 + <5)^J x n random Bernoulli 
matrix defines a Kashin's subspace with a(5) = cS 2 . A random Bernoulli matrix 
is computationally easier to implement than a random Gaussian matrix, while the 
distance between the norms is not much worse than in the optimal case. At the 
same time, since the subspaces generated by a Bernoulli matrix are spanned by 
random vertices of the discrete cube, they have relatively simple structure, which 
is possible to analyze. 



a{5) =c6 2 , 



C^aiS) II y\\ 2 < A^ 1/2 ||2/||i < \\y\\ 2 for all y e E. 



(5.6) 
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